add CPU platforms to instances #8728

iximeow · 2025-07-30T00:21:36Z

this materializes RFD 314 and in some respects, 505.

builds on #8725 for CPU family information, which is a stand-in for the notion of sled families and generations described in RFD 314. There are a few important details here where CPU platforms differ from the sled CPU family and I've differed from 314/505 (and need to update the RFDs to match). I'd not noticed the sheer volume of comments on https://github.com/oxidecomputer/rfd/pull/515 so I'm taking a pass through those and the exact bits in MILAN_CPUID may be further tweaked. I suspect the fixed array needs at least a few more tweaks anyway, cross-referencing RFD 314 turns out to make for awkward review and it's hard to eyeball the semantic of bits here (or which are to be filled in by some later component of the stack!)

As-is: I think would be OK to merge but is not quite as polished as I'd like it to be, so it's a real PR but I expect further changes.

hardware CPU families are less linear than Oxide CPU platforms.

We can (and do, in RFD 314) define Milan restrictively enough that we can present Turin (and probably later!) CPUs to guests "as if" they were Milan. Similarly I'd expect that Turin would be defined as roughly "Milan-plus-some-AVX-512-features" and pretty forward-compatible. Importantly these are related to but not directly representative of real CPUs; as an example I'd expect "Turin"-the-instance-CPU-platform to be able to run on a Turin Dense CPU. Conversely, there's probably not a reason to define a "Turin Dense" CPU platform since from a guest perspective they'd look about the same.

But at the same time the lineage through the AMD server part family splits at Zen 4 kind of, with Zen 4 vs Zen 4c-based parts and similar with Zen 5/c. It's somewhat hard (I think) to predict what workloads would be sensitive to this. And as #8730 gets into a bit, the details of a processor's packaging (core topology, frequency, cache size) can vary substantially even inside one CPU family. The important detail here is that we do not expect CPU platforms to cover these details and it would probably be cumbersome to try; if the instance's constraint is "I want AVX256, and I want to be on high-frequency-capable processors only", then it doesn't actually matter if it's run on a Turin or a Milan and to tie it to that CPU platform may be overly restrictive.

On instance CPU platforms, the hope is that by focusing on CPU features we're able to present a more linear path as the microarchitecture grow.

instance platforms aren't "minimum"

I've walked back the initial description of an instance's CPU platform as the "minimum CPU platform". As present in other systems, "minimum CPU platform" would more analogously mean "can we put you on a Rome Gimlet or must we put you on a Milan Gimlet?", or "Genoa Cosmo vs Turin Cosmo?" - it doesn't seem possible to say "this instance must have AVX 512, but otherwise I don't care what kind of hardware it runs on.", but that's more what we mean by CPU platform.

In a "minimum CPU platform" interpretation, we could provide a bunch of Turin CPUID bits to a VM that said it wanted Milan. But since there's no upper bound here, if an OS has an issue with a future "Zen 14" or whatever, a user would discover that by their "minimum-Milan" instance getting scheduled on the new space-age processor and exploding on boot or something. OSes shouldn't do that, but...

Implementation-wise, this is really just about the names right now. You always get Milan CPUID leaves for the time being. When there are Turin CPUID leaves defined for the instance CPU platform, and Cosmos on which they make sense, this becomes more concrete.

"are these CPU platforms compatible?"

RFD 314 deserves a section outlining how we determine a new CPU can be a stand-in for an older platform, talking about this and other leaves (particularly if we pass topology information through, elsewhere). I've added a few words about the problem of a CPUID profile being reasonable to present on some hardware's different CPUID profile, but I've also outlined what I've spotted as potential incompatibilities in the currently-dead functionally_same(). This is one of those things that will have the shape of "very boring and straightforward for eight years, and then a bit will change (or be defined) that's kind of annoying".

RFD 505 proposes that instances should be able to set a "minimum hardware platform" or "minimum CPU platform" that allows users to constrain an instance to run on sleds that have a specific set of CPU features available. This allows a user to opt a VM into advanced hardware features (e.g. AVX-512 support) by constraining it to run only on sleds that support those features. For this to work, Nexus needs to understand what CPUs are present in which sleds. Have sled-agent query CPUID to get CPU vendor and family information and report this to Nexus as part of the sled hardware manifest.

the existing plumbing was sufficient for sled-agent to report the CPU family at startup, but did not provide the CPU family when Nexus calls later for inventory collections. when you've upgraded to this version, the database migration sets the sled CPU family to `unknown` expecting that the next inventory collection will figure things out. this doesn't happen, and the initial check-in doesn't update the CPU type either (presumably because the sled is already known and initialized from the control plane's perspective?) this does... most of the plumbing to report a sled's CPU family for inventory collection, but it doesn't actually work. `SledCpuFamily` being both in `omicron-common` and `nexus-client` is kind of unworkable. probably need a `ConvertInto` or something to transform the shared into the `nexus-client` when needed..? i've been trying to figure out what exactly is necessary and what is just building a mess for myself for two hours and this feels like it's going nowhere.

nexus/src/app/instance_platform.rs

iximeow

finally got to taking a fine-tooth comb through the CPUID bits here and differences between hardware and what guests currently see. for the most part this is in line with what guests already get from byhve defaults but i've noticed a few typos that unsurprisingly do not pose an issue booting at least Alpine guests. i'll clean that up and update 314 appropriately tomorrow.

iximeow · 2025-08-04T22:57:46Z

nexus/src/app/instance_platform.rs

+    // See [RFD 314](https://314.rfd.oxide.computer/) section 6 for all the
+    // gnarly details.
+    const MILAN_CPUID: [CpuidEntry; 32] = [
+        cpuid_leaf!(0x0, 0x0000000D, 0x68747541, 0x444D4163, 0x69746E65),


a guest currently sees eax=0x10 here, where leaves 0xe, 0xf, and 0x10 are all zeroes. 0xe is reserved and zero on the host. 0xf and 0x10 are zeroed because of the default byhve masking behavior. setting eax=0xd is just more precise: leaves 0xf and 0x10 being "present" but zero does not communicate any feature presence.

i know we reference RFD 314 but it kinda seems like it would be nice to have the comment explaining this here?###

i went a bit hard on the comment and now there's like 600 more lines of more-a-code-than-a-comment (https://github.com/oxidecomputer/omicron/pull/8728/files#diff-0c754f739cd972f46b539d2a2e5a6220cd0b72e8c9fe7f20a2013fab3f28aa21R167)

nexus/src/app/instance_platform.rs

iximeow · 2025-08-05T19:25:29Z

nexus/src/app/instance_platform.rs

+            0xB, 0x0, 0x00000001, 0x00000002, 0x00000100, 0x00000000
+        ),
+        cpuid_subleaf!(
+            0xB, 0x1, 0x00000000, 0x00000000, 0x00000201, 0x00000000


this is the same as RFD 314 says, but eax ought to be different in the RFD. Propolis fixes up B.1.EBX here to the number of vCPUs, but eax of 0 implies that the subleaf is actually invalid (from the APM: "If this function is executed with an unimplemented level (passed in ECX), the instruction returns all zeroes in the EAX register." .. also taken faithfully, this implies the VM's topology is vCPU-many sockets with SMT pairs across each socket pair. oops).

we should probably update the RFD to reflect that? also lol Oxide is the ONLY platform that lets you make a 64-socket VM...

well, it turns out the RFD is fine here, there's a later sentence that says "Index 1's eax and ebx depend on the VM shape". But Propolis is wrong here: oxidecomputer/propolis#939

So I'm going to hide this leaf in the proposed Milan v1 platform, fix the Propolis bug, and we can expose leaf Bh in a v2.

RFD 505 proposes that instances should be able to set a "minimum hardware platform" or "minimum CPU platform" that allows uers to constrain an instance to run on sleds that have a specific set of CPU features available. Previously, actually-available CPU information was plumbed from sleds to Nexus. This actually adds a `min_cpu_platform` setting for instance creation and uses it to drive selection of guest CPUID leaves. As-is, this moves VMs on Gimlets away from the byhve-default CPUID leaves (which are effectively "host CPUID information, but features that are not or cannot be virtualized are masked out"), instead using the specific CPUID information set out in RFD 505. There is no provision for Turin yet, which instead gets CPUID leaves that look like Milan. Adding a set of CPUID information to advertise for an `amd_turin` CPU platform, from here, is fairly straightforward. This does not have a mechanism to enforce specific CPU platform use or disuse, either in a silo or rack-wide. One could imagine a simple system oriented around "this silo is permitted to specify these minimum CPU platforms", but that leaves uncomfortable issues like: if a silo A permits only Milan, and silo B permits Milan and Turin, all Milan CPUs are allocated already, and someone is attemting to create a new Milan-based VM in silo A, should this succeed using Turin CPUs potentially starving silo B?

…platform as defined in Nexus

nexus/db-queries/src/db/queries/sled_reservation.rs

iximeow · 2025-08-07T21:41:32Z

nexus/src/app/sagas/instance_migrate.rs

@@ -612,6 +613,7 @@ mod tests {
                external_ips: vec![],
                disks: vec![],
                boot_disk: None,
+                cpu_platform: None,


sure is weird that this file doesn't include a change to SledReservationConstraintBuilder like in instance_start.rs huh?

that's a bug. this currently would allow a Turin to migrate onto a Milan (bad)

hawkw

Have a cursory initial review. Beyond the things you've already identified, this change seems like it's going in the right direction.

hawkw · 2025-08-21T21:49:42Z

common/src/api/external/mod.rs

+/// hypervisor support may be masked off. See RFD 314 for specific CPU features
+/// in a CPU platform.


Is RFD 314 public, and do we intend to make it public? If we're referencing it in a comment that will eventually make it into the public API docs, perhaps we should?

hawkw · 2025-08-21T21:55:23Z

schema/crdb/dbinit.sql

+     * If this is NULL, the control plane ignores CPU constraints when selecting
+     * a sled for this instance. Then, once it has selected a sled, it supplies
+     * a "lowest common denominator" CPU platform that is compatible with that
+     * sled to maximize the number of sleds the VM can migrate to.


turbo nit, sorry: it might be nice to have a similar comment in the db model Instance type --- when reading the type definition, i wasn't sure why this was null, so had to go check dbinit for it

hawkw · 2025-08-21T21:56:43Z

nexus/db-model/src/schema_versions.rs

+        KnownVersion::new(175, "add-instance-cpu-platform"),
+        KnownVersion::new(174, "sled-cpu-family"),


take it or leave it, but i think it would've been fine to do this in a single migration --- though maybe separating them is nicer since they change different tables...

this is actually a weird github thing.. as a draft this was on top of another PR that added sled CPU family reporting, which is merged, and i think was migration 174! i thought this would evaporate out of the diff. i'll have to take a closer look at what's going on with that..

ah ok, so when landing the sled CPU family PR i had to bump the schema number a bit so as to actually make it the latest, and i did not rebase this PR on the commit that finally went in. i've merged main and this whole diff is much more reasonable now.

hawkw · 2025-08-21T22:01:15Z

nexus/db-model/src/vmm_cpu_platform.rs

+            // VMMs get the "sled default" CPU platform when an instance starts
+            // up on a sled that hasn't reported a well-known CPU family. Assume
+            // that nothing is known about the VM's compatible CPU platforms in
+            // this case.
+            Self::SledDefault => None,


this...feels like some combination of reasonable and sketchy. i kinda wonder if we ought to just never schedule instances on sleds that haven't reported a well-known CPU family? i want to not have a situation where a VM experiences a weird blip in virtual platform due to getting scheduled on a Mystery Sled..

yeah.. this is kind of a provision to make test environments not require Exactly A Zen 3 Or Zen 5 Or Carry Your Own Patches - to get here you have to have been scheduled on a SledCpuFamily::Unknown, and for that to have happened either we've shipped a sled-agent that doesn't know how to identify a sled's CPU anymore, or we've shipped a sled that sled-agent doesn't know about and placed instances on it before updating it. (even if you set the instance CPU platform to null through the API you end up at the minimum compatible platform for the selected sled, so there's no way to get here from just API actions)

the good/bad news is that in this case you won't be able to migrate the instance anyway, so it's a "blip" in that the instance will just be this way until you stop and start it again..

hawkw · 2025-08-21T22:04:21Z

nexus/db-queries/src/db/queries/sled_reservation.rs

+        ",
+    );
+
+    // TODO(gjc): eww. the correct way to do this is to write this as


//TODO(gjc):

😢

Greg would be glad to know that the way to spell this that Diesel and Cockroach are happy about is fully baffling: 75ed627

hawkw · 2025-08-21T22:12:44Z

nexus/src/app/instance_platform.rs

+            0xB, 0x0, 0x00000001, 0x00000002, 0x00000100, 0x00000000
+        ),
+        cpuid_subleaf!(
+            0xB, 0x1, 0x00000000, 0x00000000, 0x00000201, 0x00000000


we should probably update the RFD to reflect that? also lol Oxide is the ONLY platform that lets you make a 64-socket VM...

nexus/src/app/instance_platform.rs

hawkw · 2025-08-21T22:29:08Z

nexus/src/app/sagas/instance_start.rs

+    // If the instance didn't supply a CPU platform, select one for this VMM by
+    // looking up the chosen sled and selecting the "minimum compatible
+    // platform" for sleds of that lineage. This maximizes the number of sleds
+    // that can host the VMM if it needs to migrate in the future. Selecting the
+    // sled first and then deriving the platform is meant to support
+    // heterogeneous deployments: if a deployment contains some sleds with CPUs
+    // from vendor A, and some with CPUs from vendor B, then selecting the sled
+    // first implicitly chooses a vendor, and then the "minimum compatible"
+    // computation selects the most compatible platform that can run on sleds
+    // with CPUs from that vendor.


I initially felt a bit uncomfy about leaving it unbound until the instance is started (mainly just because I was trying to think my way out of having the db column be nullable), but I think that not picking something until the instance is started is the right move, so that it can be based on the sled that ends up hosting the instance, and we can select that sled in a less constrain-y way.

hawkw · 2025-08-21T22:32:31Z

nexus/types/src/external_api/params.rs

+    /// The CPU platform to be used for this instance. If this is `null`, the
+    /// instance requires no particular CPU platform.


Since this comment generates API docs, perhaps we should discuss what happens if you set it to null here? I would want to know that "the instance requires no particular CPU platform" also means "...and will be implicitly bound to the CPU platform of whatever the first sled to host it is", rather than meaning "we will always let you schedule that instance anywhere and if the guest is confused by this it's your fault"

"we will always let you schedule that instance anywhere and if the guest is confused by this it's your fault"

that is actually what it means lol, so yes definitely this deserves a note. the platform that's bound is on the VMM, specifically so that we can't migrate you to a different kind of system once your instance is running. but if you stop it and start it again without an explicit CPU platform you can go anywhere.

hawkw · 2025-08-21T22:34:00Z

sled-hardware/types/src/lib.rs

+
+impl std::fmt::Display for SledCpuFamily {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "{}", self.as_str())


turbo nit: slightly simpler to just do

Suggested change

write!(f, "{}", self.as_str())

f.write_str(self.as_str())

* leaf 1 should *not* have x2APIC (yet) * leaf 7 EDX bit 4 indicates FSRM, should be set

…dependency

also rustfmt, resolve warnings, and fill in explicit zero leaves so that the RFD 314-defined CPUID table matches what the builder produces now. yay!!

iximeow

frankly i did not realize how nice being able to write a raw_cpuid::CpuId would be, and how much more legible that makes this PR (and RFD 314, and future CPU platforms, and now we can pretty trivially make Propolis validate CPUID bits its told, and ...)

iximeow · 2025-08-30T03:19:41Z

nexus/src/app/instance_platform/cpu_platform.rs

+}
+
+#[test]
+fn milan_rfd314_is_as_described() {


the other cool thing we can do now is grab a CPUID dump from a guest before using explicit definitions and include a test describing how it and the RFD 314 definitions differ. i want to include that alongside this test before merging (would have been so much nicer than the manual comparison i did before...)

short history here: this bit is set in hardware, the extended space is not supported by bhyve, but the bit was passed through. hiding this bit "should" be fine, but out of caution we're not doing that in the first statically-defined CPU platform. this bit was not set in the MILAN_CPUID blob, though, even though the RFD was updated.

these tests are: * the `milan_rfd314` function computes something that matches RFD 314 * the RFD 314 definition differs in specific and justified ways from pre-314 guest CPUID profiles.

iximeow · 2025-09-03T18:03:48Z

the new #[test] fn milan_current_vs_rfd314_is_understood() describes the differences between a guest today's CPUID profile and what's in this PR in a more durable (and imo reasonable!) way than a bunch of in-line comments on this PR. i've resolved all the "a guest currently sees" comments on the RFD 314 Milan in favor of that, and in a few cases, comments there.

For a reason I can't figure out, when this was ``` AND sled.cpu_family IN (...) ``` binding a parameter made that query end up like ``` AND sled.cpu_family IN ($1,) ``` which then expects a single `sled_cpu_family` enum for later binding. Binding an array then yielded a type error like ``` DatabaseError(Unknown, "invalid cast: sled_cpu_family[] -> sled_cpu_family") ``` ... but writing it like `AND sled.cpu_family = ANY (...)` does not get the extra comma and instead renders like `ANY ($1)` which happily takes a `sled_cpu_family[]`.

iximeow force-pushed the minimum-cpu-platforms branch from fb1e0a7 to 620d7c9 Compare July 30, 2025 01:01

iximeow added 11 commits July 30, 2025 19:03

differentiate Turin and Turin Dense for the control plane

d316a2e

unwind CPU families from the public sled API

1367289

review notes

b5eaf68

fix links ugh

114f383

migration still needs to know about turin dense

4c40d47

it compiles (might work now?)

e9cbbdd

migrations need to be... right ...

bf7ccae

and that's the missing update of cpu_family.

0a79d5e

non-illumos has to build too ofc

ea59a26

fix expectorated output and, oh, docs are in the openapi spec

10cd335

iximeow commented Aug 4, 2025

View reviewed changes

nexus/src/app/instance_platform.rs Outdated Show resolved Hide resolved

iximeow commented Aug 5, 2025

View reviewed changes

iximeow and others added 9 commits August 6, 2025 20:34

cleanup

99a37f0

move SledCpuFamily to a more fitting place

6846a4a

rustfmt AGH

5f94661

and expectorate up the reconfigurator output

543bdc9

walk back "minimum"ness of CPU platforms

3831038

i want propolis logs too please thank you

33956f9

one more pass at aligning RFD 314, what we currently expose, and the …

9b6b6ec

…platform as defined in Nexus

and map the CPU platform "Turin" to all Turin sled CPU types

5cf7b9c

iximeow force-pushed the minimum-cpu-platforms branch from 620d7c9 to 5cf7b9c Compare August 7, 2025 18:20

one use of SledCpuFamily i missed in the rebase

a877f39

iximeow marked this pull request as ready for review August 7, 2025 19:54

revert the buildomat log collection changes

1d34ab2

iximeow commented Aug 7, 2025

View reviewed changes

nexus/db-queries/src/db/queries/sled_reservation.rs Outdated Show resolved Hide resolved

iximeow commented Aug 7, 2025

View reviewed changes

Base automatically changed from gjc/report-sled-families to main August 12, 2025 23:16

hawkw reviewed Aug 21, 2025

View reviewed changes

iximeow added 7 commits August 28, 2025 23:30

more(!) RFD 341 errors

4275594

* leaf 1 should *not* have x2APIC (yet) * leaf 7 EDX bit 4 indicates FSRM, should be set

move all the cpuid stuff, but it something is not great with the new …

c686acf

…dependency

update raw-cpuid, genericize ideal Milan, map CpuIdDump to Propolis

9348caf

rustfmt

af4e2ee

move MILAN_CPUID to an equivalence test

b05bae4

outline more sensitive CPUID bits into a compatibility helper

5f8aedd

also rustfmt, resolve warnings, and fill in explicit zero leaves so that the RFD 314-defined CPUID table matches what the builder produces now. yay!!

dead code

9e92b01

iximeow commented Aug 30, 2025

View reviewed changes

iximeow added 10 commits September 3, 2025 00:48

adjust leaf 8000_001D expectations with RFD 314 adjustments

bfee777

test demonstrating constructed CPUID tables are as exepected

fb12fc9

these tests are: * the `milan_rfd314` function computes something that matches RFD 314 * the RFD 314 definition differs in specific and justified ways from pre-314 guest CPUID profiles.

rustfmt

d59a842

describe the CPU profile tests a bit better, formatting

d9acd90

more fmt

63a5326

turbo nit

e5ba594

Merge remote-tracking branch 'github/main' into minimum-cpu-platforms

b2fd208

forgot to import everything the tests need

401d58b

clippy

5364358

iximeow added 3 commits September 3, 2025 21:39

hide leaf B for now (in coordination with rfd314 updates)

d17b8aa

pretty sure this forbids irresponsible migrations..

10e67c1

		/// hypervisor support may be masked off. See RFD 314 for specific CPU features
		/// in a CPU platform.

		KnownVersion::new(175, "add-instance-cpu-platform"),
		KnownVersion::new(174, "sled-cpu-family"),

		/// The CPU platform to be used for this instance. If this is `null`, the
		/// instance requires no particular CPU platform.

add CPU platforms to instances #8728

Are you sure you want to change the base?

add CPU platforms to instances #8728

Uh oh!

Conversation

iximeow commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

hardware CPU families are less linear than Oxide CPU platforms.

instance platforms aren't "minimum"

"are these CPU platforms compatible?"

Uh oh!

Uh oh!

iximeow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iximeow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iximeow commented Sep 3, 2025

Uh oh!

Uh oh!

iximeow commented Jul 30, 2025 •

edited

Loading